Notes on Quantitative Aspects of Black-White Equity in Grades 3, 4, and 5 Elementary Education in North Carolina Public Schools

Introduction

Policy-makers, educators, researchers, advocacy groups, and, via newspapers, the public, make use of K-12 student achievement data. The Grade Level Proficiency (GLP) categorization is among the most frequently mentioned quantitative criteria. In this report I make observations, based on GLP data, about quantitative aspects of education equity between Black and White K-12 students in North Carolina public, and public charter, schools. The sources of data are the publicly available “disaggregated” files provided by the North Carolina Department of Public Instruction (NCDPI) on the achievement of students during the academic years from 2013-14 to 2016-17, and ancillary publicly available NCDPI data. While the NCDPI data is extensive, it is not longitudinal, that is, it does not support following students from year to year. However, students in grades 3, 4, and 5 take standardized examinations that are reported in the NCDPI data. The comparability of this data provides a kind of quasi-cohort that allows for a coherent analysis of students in these grades. This report will focus on achievement data for grades 3, 4, and 5.

Students are classified into five achievement levels based on their scores in the grades 3, 4, and 5 tests. The publicly available NCDPI data is FERPA-compliant, consequently test scores and numbers of students are reported by grade, not class, for each school. Since FERPA requires the masking of some detailed data, I will look at the reporting of Grade Level Proficiency (GLP), which is determined by students scoring into the upper three levels of these five-level standardized tests. This avoids almost all the masked data, although it prevents detailed analysis of the individual levels.

The NCDPI disaggregated data files are organized in univariate categories including ethnicity and gender. Importantly for this analysis, there is also a self-reported category of economically disadvanteged (EDS as Yes or No). These being univariate, there is no way to identify students as, for example, Black and EDS, Hispanic and not EDS, male and Hispanic, etc. Ancillary data associates Title I status (Yes or No) with each school. There is a strong association between Title I status and a school’s overall EDS percentage. Roughly, if a school is 40% EDS, then it will classify as a Title I school. I will make use of this relationship by considering Title I as a binary indicator of the magnitude of the continuous EDS measure.

My analysis is comprised of two parts. In the first, I confirm what is commonly understood by educators and analysts who have looked closely at North Carolina student achievement scores. Namely, North Carolina is effectively two states partitioned by wealth and race. The contribution of this report is in the analysis and visualization of various aspects of this partitioning, and also in inspection of the grade 3, 4, and 5 standardized tests. In the second part, I consider the utility of the grade 3, 4, 5 test suite, as made available in the NCDPI data files. I observe that the publicly available data yields reliable equity estimates in the aggregate, but is unreliable in detail.

Part I. North Carolina as Two States

I.1. Wealth and Ethnicity

I begin with the use of aggregated data in order to take advantage of what is sometimes called the law of large numbers. This also is expressed as the central limit theorem, and convergence to the mean. For the present purposes, I look at the means and quantiles of GLP%. A standard way of presenting this is by using Tukey box plots. In these plots, the lower edge of the box is the 25th percentile, the upper edge the 75th percentile, and the horizontal line in the middle of the box is the 50th percentile (the median). The “whiskers” extend below and above the median by 1.6 times the distance between the 75th and 25th percentiles. Points above or below the extent of the whiskers are referred to as outliers. This does not mean that these points are in some way dubious, just that they lie far from the median.

The following figures present a substantiation of the “two North Carolinas” assertion.

Figure I.1. …


Figure I.1 selects one year (2015-16) and one subject (Reading) across all NC schools, and presents GLP% box plots for grades 3, 4, and 5. The two left-most boxes for each grade are the EDS and non-EDS GLP%, while the three right-most boxes show GLP% for Black, Hispanic, and White students. There is in addition a box for the cumulative GLP%. Plots run for other years, as well as for Mathematics, appear much the same. Observations for other years are substantially the same. Plots for the other years can be found in Appendix ???. Keep in mind that due to compliance with FERPA, GLP% above 100% is always shown as 97.5%, while GLP% less than 5% is shown as 2.5%. This results in some lumpiness at the top and bottom of the GLP% axis.

Figure I.2. …


Figure I.2 selects one grade (5) and one subject (Reading) across the four academic years from 2013-14 to 2016-17.

Figure I.3. …


Figure I.3 tracks a cohort, starting with Reading for grade 3 in 2014-15 and following it to grade 5 in 2016-17. Once again, plots run for other years, as well as for Mathematics, appear much the same.

From these graphics we observe that reducing reporting or evaluation to a single, state-wide number is of little utility. Whether the ALL category percentiles shift up or down, and by how much, might be due to changes up or down in White or Black. Looked at from the point of view of EDS, changes in ALL could be due to increases or decreases in EDS and not-EDS. Relying on the ALL category obscures underlying causes and can be misleading.

It is also clear that the separation between EDS categories, and between White students and Black (as well as Hispanic) students, resemble one another. As mentioned earlier, the NCDPI data does not allow us to associate EDS with any ethnicity.

I.2. GLP% Distributions

In this section I first look more closely at the GLP% profiles for the ethnicities Black, Hispanic, and White, and also at the EDS Yes-No categories. Figure I.4 shows these categories in histograms of GLP% for 2016-17 Reading for grade 5 over all schools.

Figure I.4. 2015-16 Reading for Grade 5


It is evident that the distributions for Not-EDS students differ structurally from those for the other categories. Not only are the means clearly different, but the shapes are different. The EDS data is reasonably described as a normal, symmetrical distribution, while the Not-EDS is biased toward higher percentages. The vertical bar in Not-EDS and White is an artifact of FERPA compliance and should be imagined as spread out over the 95% to 100% interval.

The differences between the Black, Hispanic, and White students are less pronounced. The Black and Hispanic distributions have long tails into higher GLP%. It would be of great value to be able to determine whether these tails are associated with Not-EDS students, but that data is not publicly available.

Improvements in GLP% for Not-EDS could be expressed as ‘sweeping’ the left-hand tail into the mass of the data. Improvements for the other categories could be right-directed sweeping, or shifts of the entire distribution toward higher GLP%. This contrast reinforces the disutility of speaking in terms of summary data, i.e., using the ALL category.

I.3. Is There “Sharing” Based on Wealth Indicators?

It is most direct to quote the NC Department of Commerce website here: “The N.C. Department of Commerce annually ranks the state’s 100 counties based on economic well-being and assigns each a Tier designation. The 40 most distressed counties are designated as Tier 1, the next 40 as Tier 2 and the 20 least distressed as Tier 3.” Figure I.5 replicates Figure I.1 except that it presents data only for the Tier 3 counties. It appears that the EDS and ethnicity differences are very much the same as those found when looking at data for entire state (Figure I.1).

Figure I.5. …


I.4. ???

I treat here some matters of equity by looking at the GLP% for Black students in schools where the number of White students exceeds the number of Black students. Following are two interactive scatter plots for grades 3, 4, and 5 in the year 2016-17. In the interest of brevity, I treat the cumulative grades 3, 4, and 5 GLP% across all North Carolina schools. Placing the cursor on a point displays the school code. Schools above the diagonal line have GLP% for Black students exceeding that for White students.

IMPORTANT NOTE: The smaller the count of students, the less dependable is the GLP%. That is, since we are looking at percentages, GLP% for student counts of under twenty or so can be substantially affected if only one or two students were to move from the GLP to not GLP, or vice versa. I have not included error bars in the scatterplots because this would make them very cluttered and difficult to read.

Figure I.6. …


The interactive scatter plot provides the opportunity to pursue further the best and worst performing schools.

Figure I.7 is similar to Figure I.6, but shows a comparison of EDS and not-EDS students. The same striking dichotomy persists.

Figure I.7. …


Figure I.8 addresses a follow-up question: how do Black students perform in the better performing predominantly White schools. In this plot the schools are limited to those where 1) the White GLP% was at least 75% and 2) there were more White than Black students. There does not appear to be any correlation between the scores of Black and White students. Put another way, Black students do not appear to consistently benefit from being in high-performing, majority White schools, the GLP% of the Black students being as likely high as low. Factors that contribute to high performance for Black (and for EDS) students are beyond a simplistic attribution to a high White proportion. The interactive scatter plot can be used to identify the schools in which the Black students were best and worst performing.

Figure I.8. …


I.5 Identifying Some Schools With High Achievement for EDS Students

In this section I redirect and broaden attention to the EDS category. This provides the opportunity to identify the schools in the right-hand tail of the EDS plot in Figure I.4. I ask the following question: In which schools are EDS students repeatedly achieving high GLP scores in grades 3, 4, and 5? In order to quantify this I make the question more precise, although it does become somewhat restrictive. This is best served by GLMM modelling, but I will defer that until a later report.

The question begins with recognizing that NCDPI makes available comparable GLP% data on three grades (3, 4, and 5), two subjects (mathematics and reading), and four years of data (2013 through 2016). That provides twenty-four possible reports of GLP%. The criteria I arbitrarily specify is the following: a school will be considered notable if EDS students achieve 75% GLP% in three out of the six grade-subject (three grades and two subjects) in at least two of the years. This is carried out irrespective of the particular grade and subject, i.e., it is across all grades and subjects. Note that successful schools may be excluded if they do not have students in all of the grades (for instance, KIPP Gastonia), or if they have not been in existence for all four years.

Here are some results. “total” is the school enrollment across all grades, not just 3, 4, and 5. “pct_EDS” is also across all grades. “nyears” is the number of years for which the criterion of three instances of GLP% 75% are satisfied over the years 2013-14 through 2016-17.

A total of 32 schools out of the approximately 1490 elementary schools qualify as notable. Notice that some of the charter schools qualify for Title I assistance even though their overall EDS percentage is low.

These are the schools that had four years of high GLP% for EDS students over the years 2013-14 through 2016-17, as described above:

nyears district school_name school_code total pct_EDS TitleI charter
4 Haywood County Schools Riverbend Elementary 440332 212 56 Yes
4 New Hanover County Schools Ogden Elementary 650356 698 19
4 Yancey County Schools Bald Creek Elementary 995304 161 58 Yes

These schools had three years of high GLP% for EDS students over the years 2013-14 through 2016-17:

nyears district school_name school_code total pct_EDS TitleI charter
3 Burke County Schools Rutherford College Elem 120372 229 50 Yes
3 Charlotte-Mecklenburg Schools McKee Road Elementary 600451 550 10
3 Henderson County Schools Glenn C Marlow Elementary 450339 559 37 Yes
3 Rutherford County Schools Thomas Jefferson Class Academy 81A000 1300 19 Yes
3 Wake County Schools Cedar Fork Elementary 920369 1086 9
3 Wake County Schools Davis Drive Elementary 920390 1096 6

These schools had two years of high GLP% for EDS students over the years 2013-14 through 2016-17 (this list has less statistical significance than the preceding lists):

nyears district school_name school_code total pct_EDS TitleI charter
2 Avery County Schools Crossnore Elementary 060316 220 60 Yes
2 Buncombe County Schools Evergreen Community Charter 11A000 442 35 Yes Yes
2 Buncombe County Schools Pisgah Elementary 110388 233 73 Yes
2 Cabarrus County Schools Cabarrus Charter Academy 13B000 1700 14 Yes Yes
2 Carteret County Public Schools Tiller School 16B000 205 18 Yes
2 Charlotte-Mecklenburg Schools Hawk Ridge Elementary 600406 932 6
2 Charlotte-Mecklenburg Schools Olde Providence Elementary 600491 733 12
2 Charlotte-Mecklenburg Schools Polo Ridge Elementary 600392 1081 2
2 Charlotte-Mecklenburg Schools Providence Spring Elementary 600507 903 3
2 Forsyth County Schools Lewisville Elementary 340432 626 27
2 Gaston County Schools Mountain Island Charter School 36C000 1270 21 Yes Yes
2 Haywood County Schools Jonathan Valley Elementary 440349 364 55 Yes
2 Henderson County Schools Clear Creek Elementary 450307 513 66 Yes
2 Henderson County Schools Hendersonville Elementary 450333 427 35 Yes
2 New Hanover County Schools Dr John Codington Elem 650366 567 23
2 Polk County Schools Polk Central Elementary School 750314 364 81 Yes
2 Transylvania County Schools Brevard Academy 88A000 321 46 Yes Yes
2 Union County Public Schools Poplin Elementary 900347 831 29
2 Union County Public Schools Union Academy 90A000 1386 11 Yes Yes
2 Union County Public Schools Weddington Elementary 900376 770 5
2 Wake County Schools Holly Grove Elementary 920457 1198 10
2 Wake County Schools Triangle Math and Science Academy 92T000 808 15 Yes
2 Watauga County Schools Two Rivers Community School 95A000 173 46 Yes Yes

Part II. The Nature of the Grade 3, 4, 5 Tests

II.1. Score-Frequency Tables

The extent of aggregation in the pub;ic;y available NCDPI data in response to FERPA requirements makes it difficult to use this data to fully understand the characteristics of the tests. However, we can make some useful observations. The NCDPI Green Books provide score-frequency tables for the tests from 2013-14 to 2015-16 (this will be extended to 2016-17 later this year).

There is, of course, a different test for each year for each grade, and for each of the two subjects, ELA and MA. The tests are constructed by educators and psychometricians. The process of validation is not, to my knowledge, documented in any publicly accessible way. The tests are used to classify students into five levels. Students testing into any of the three higher levels are considered to be grade level proficient (GLP). If they test into the two highest levels they are considered to be career and college ready (CCR). The weighted median score lies in Level 3, that is, within GLP but not in CCR.

Figure II.1 is a visualization of a score-frequency table. This is one of the eighteen tests covering the 2013-14 to 2015-16 years. While all the tests are similar, there are significant differences in their details. An extensive exposition of the available score-frequency data is presented in “Score Frequency” Appendix.

Figure II.1. …


Salient characteristics of the tests include the small range of scores, about 60 points of the maximum 470, that students score into. A similar methodology is used in many tests including those for higher grades, such as the SAT. Nevertheless, we are looking at 60 points, not 470. It is also evident that there is a step, or discontinuity, that ‘guards’ level 3 (the start of GLP). Level 3 is narrow, only two or three points. It does not appear to be a ‘trap’ of consequence, since the counts of students in level 4 (CCR) are high.

II.2. Reliability of the Tests

The NCDPI Score-Frequency data gives the impression of precision, of exactness and completeness. However, students being susceptible to illnesses, emotional stress and so on, the same tests given to the same students on some different days would result in at least slightly different results. There are also influences such as change in principal, teacher turnover, student ethnic composition, number of students who need English development, and so on. This is expressed in the randomness that is associated with almost all testing. There are also the statistical variances where schools with very small numbers of students in an ethnic category may show exaggerated variability. In the aggregate of 1400 schools some small differences may not be of consequence, but that might not hold when looking more closely at these tests.

The classification of students into GLP% and the reporting of GLP percentages are consequences of the tests. Thus far in these notes I have been considering aggregations of data, taking advantage of “properties of large numbers.” Now I will consider two questions and show that they have at best doubtful answers. First, how much confidence can be put in GLP% for the same school, for the same grade and subject, from year to year. That is, when tracking grades and subjects from year to year and looking at individual schools, how consistent are the GLP%? The implication of a negative answer is that the GLP% scores may themselves be considered of questionable utility. This is well served by a GLMM analysis, which I will undertake in a subsequent report.

Another consideration is whether this data can be used to compare year-to-year changes between schools. If policy decisions are based on comparisons of GLP% between schools, then the randomnicity apparent in these plots casts doubt on their utility.

Figure II.2 compares the changes in GLP% by school from one year to the next, for instance, from 2014-15 to 2015-16, to the changes in the succeeding year, 2015-16 to 2016-17. Additional plots are in the “Plot Compares” Appendix.

Figure II.2. All Schools Math Grade 3


This shows several important aspects of the grade 3, 4, and 5 GLP% test results. There is a grouping in the center of the plot, that is, around small changes. This is to be expected and is reflective of the realities mentioned above. Schools in the upper right-hand quadrant performed better in 2015-16 and 2016-17 as compared with 2014-15. The lower left-hand quadrant contains schools that performed worse for both years, etc. One might expect some schools to do remarkably better or worse as interventions, population shifts, staff changes, or finances change. However, the size of the “cloud”, where we see shifts up and down by over ten percent from year to year, requires explanation. Are the schools in the cloud distinguished by some internal or external circumstances? Are there additional influences that might be available in other data sources, or that are not quantitative? To what extent do large changes cast doubt on the utility of the tests and testing procedures?

The appearance of this data for specific LEAs is similar to that for the whole. Figure II.3 shows a comparable plot for Wake.

Figure II.3. Wake County Math Grade 3


The second question I address is how much confidence can be put in GLP% for the same school, for the same subject, when following a cohort from year to year. That is, when tracking grade 3 to grade 4, and 4 to 5, from year to year and looking at individual schools, how consistent are the GLP%? Figure II.4 (and for Wake in Figure II.5) shows that the appearance of cohort tracking resembles that for same-grade, shown in Figure II.2. It appears that the utility of GLP% comparisons for cohorts is limited.

Figure II.4. All Schools Math Cohort


Figure II.5. Wake County Math Cohort


Part III. Brief Remarks on Charter Schools

My analysis of charter schools is limited because they would have had to provide grade 3, 4, and 5 classes since 2014-15. Given that restriction, I present in Figures III.1 and III.2 the same plots already seen in previous sections. More extensive plots can be found in the “Charters” Appendix.

Figure III.1 Charters Only

Figure III.2 Charters Only

Part IV. Conclusions

I have shown that the Grade Level Proficiency data made publicly available by NCDPI has utility when considered in its aggregations. However, when looked at in detail, this data is lacking in ways that might not be evident to policy makers and educators.

In order to move ahead in the analysis of this student achievement data, NCDPI should make publicly available student longitudinal data, which already exists and which is already FERPA compliant. At a minimum, school and grade/subject score-frequency data, categorized by EDS and ethnicity, should be made available.